Week 10
Count Models and Multilevel Modelling

SSPS4102 Data Analytics in the Social Sciences
SSPS6006 Data Analytics for Social Research


Semester 1, 2026
Last updated: 2026-01-23

Francesco Bailo

Acknowledgement of Country

I would like to acknowledge the Traditional Owners of Australia and recognise their continuing connection to land, water and culture. The University of Sydney is located on the land of the Gadigal people of the Eora Nation. I pay my respects to their Elders, past and present.

Learning Objectives

By the end of this lecture, you will be able to:

  • Fit Poisson and negative binomial models for count data
  • Understand and diagnose overdispersion
  • Introduce multilevel/hierarchical models
  • Choose appropriate models for different outcome types
  • Understand link functions in GLMs

This Week’s Readings

TSwD

  • Ch 13: Generalized linear models
    • 13.3 Poisson regression
    • 13.4 Negative binomial regression
    • 13.5 Multilevel modelling

ROS

  • Ch 15: Other generalized linear models

Count Data

What is Count Data?

Count data are observations that can only take non-negative integer values: 0, 1, 2, 3, …

Examples in social science:

  • Number of protest events in a country per year
  • Number of children in a household
  • Number of times a voter contacts their representative
  • Number of crimes reported in a neighbourhood
  • Number of A grades awarded in a course

Why Not Use Linear Regression?

Linear regression assumes:

  • Continuous outcome variable
  • Normally distributed errors
  • Constant variance (homoscedasticity)

Problems with count data:

  • Counts are discrete, not continuous
  • Cannot be negative (but linear regression can predict negative values!)
  • Variance often increases with the mean
  • Distribution is often skewed, especially for low counts

Key Issue

Using linear regression for count data can produce impossible predictions (like -3.5 crimes) and incorrect standard errors.

The Poisson Distribution

The Poisson distribution is designed for count data. It is governed by a single parameter, λ (lambda), which represents both the mean and variance.

\[P_\lambda(k) = \frac{e^{-\lambda} \lambda^k}{k!}, \text{ for } k = 0, 1, 2, ...\]

Key property: In the Poisson distribution, the mean equals the variance.

\[E(Y) = Var(Y) = \lambda\]

Simulating from the Poisson Distribution

# Simulate 20 draws with lambda = 3
set.seed(853)
rpois(n = 20, lambda = 3)
 [1] 2 1 3 2 0 2 1 2 1 1 6 4 1 1 2 0 2 2 2 1

Poisson Regression

The Poisson Regression Model

In Poisson regression, we model the expected count as a function of predictors:

\[y_i \sim \text{Poisson}(e^{X_i\beta})\]

Or equivalently:

\[\log(\lambda_i) = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ...\]

The Log Link Function

We use the logarithm as the link function because:

  1. It ensures predictions are always positive
  2. Coefficients have a multiplicative interpretation

Interpreting Poisson Regression Coefficients

Because we use a log link, coefficients are interpreted as multiplicative effects on the expected count.

For a coefficient β:

  • A one-unit increase in x multiplies the expected count by \(e^\beta\)
  • If β = 0.10, then \(e^{0.10} = 1.105\), meaning a 10.5% increase
  • If β = -0.20, then \(e^{-0.20} = 0.82\), meaning an 18% decrease

Quick Approximation

For small coefficients (|β| < 0.1), the coefficient approximately equals the percentage change. For example, β = 0.05 ≈ 5% increase.

Example: Simulated Poisson Data

set.seed(853)
n <- 100
x <- runif(n, -2, 2)
a <- 1
b <- 0.5
linpred <- a + b * x
y <- rpois(n, exp(linpred))
sim_data <- tibble(x = x, y = y)

ggplot(sim_data, aes(x = x, y = y)) +
  geom_point(alpha = 0.6, size = 3) +
  stat_function(
    fun = function(x) exp(a + b * x),
    colour = "red", linewidth = 1
  ) +
  labs(x = "Predictor (x)", 
       y = "Count (y)",
       title = "Simulated Poisson Data") +
  theme_minimal(base_size = 16)

Fitting Poisson Regression in R

Using glm() with family = poisson:

fit_poisson <- glm(y ~ x, family = poisson(link = "log"), data = sim_data)
summary(fit_poisson)

Call:
glm(formula = y ~ x, family = poisson(link = "log"), data = sim_data)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.98083    0.06392  15.345   <2e-16 ***
x            0.51677    0.05599   9.229   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 176.535  on 99  degrees of freedom
Residual deviance:  86.251  on 98  degrees of freedom
AIC: 343.93

Number of Fisher Scoring iterations: 5

Using rstanarm for Bayesian Estimation

fit_poisson_bayes <- stan_glm(
  y ~ x,
  family = poisson(link = "log"),
  data = sim_data,
  seed = 853
)
print(fit_poisson_bayes)
stan_glm
 family:       poisson [log]
 formula:      y ~ x
 observations: 100
 predictors:   2
------
            Median MAD_SD
(Intercept) 1.0    0.1   
x           0.5    0.1   

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

Example: Number of A Grades by Department

Let’s simulate data about the number of A grades awarded in university courses across three departments:

set.seed(853)
class_size <- 26

count_of_A <- tibble(
  department = c(rep("1", 26), rep("2", 26), rep("3", 26)),
  course = c(paste0("DEP_1_", letters),
             paste0("DEP_2_", letters),
             paste0("DEP_3_", letters)),
  number_of_As = c(
    rpois(n = class_size, lambda = 5),
    rpois(n = class_size, lambda = 10),
    rpois(n = class_size, lambda = 20)
  )
)

Visualising the A Grade Data

Fitting the Model

grades_model <- stan_glm(
  number_of_As ~ department,
  data = count_of_A,
  family = poisson(link = "log"),
  seed = 853
)
print(grades_model)
stan_glm
 family:       poisson [log]
 formula:      number_of_As ~ department
 observations: 78
 predictors:   3
------
            Median MAD_SD
(Intercept) 1.3    0.1   
department2 0.9    0.1   
department3 1.7    0.1   

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

Interpreting the Results

  • Intercept (1.32): Department 1 has expected count of \(e^{1.32} = 3.7\) A grades
  • department2 (0.88): Department 2 has \(e^{0.88} = 2.4\) times as many A grades as Department 1
  • department3 (1.71): Department 3 has \(e^{1.71} = 5.5\) times as many A grades as Department 1
# Expected counts per department
exp(coef(grades_model)[1])  # Dept 1
(Intercept) 
   3.746707 
exp(coef(grades_model)[1] + coef(grades_model)[2])  # Dept 2
(Intercept) 
    9.06784 
exp(coef(grades_model)[1] + coef(grades_model)[3])  # Dept 3
(Intercept) 
   20.62531 

Overdispersion

What is Overdispersion?

Overdispersion occurs when the observed variance in the data exceeds what the model predicts.

For Poisson regression:

  • The model assumes \(Var(Y) = E(Y) = \lambda\)
  • If actual \(Var(Y) > \lambda\), we have overdispersion

Why Does This Matter?

Overdispersion leads to:

  • Underestimated standard errors
  • Inflated test statistics
  • Incorrect p-values
  • Overconfident conclusions

Checking for Overdispersion

Compare the mean and variance of your count variable:

# For a Poisson distribution, mean ≈ variance
count_of_A |>
  summarise(
    mean = mean(number_of_As),
    variance = var(number_of_As),
    ratio = variance / mean
  )
# A tibble: 1 × 3
   mean variance ratio
  <dbl>    <dbl> <dbl>
1  11.2     62.3  5.57

A ratio much greater than 1 suggests overdispersion.

Causes of Overdispersion

  1. Unobserved heterogeneity: Important variables are missing from the model

  2. Clustering: Observations within groups are correlated

  3. Excess zeros: More zeros than the Poisson distribution predicts

  4. Outliers: A few extreme values

  5. Wrong distributional assumption: The data-generating process isn’t Poisson

Negative Binomial Regression

The Negative Binomial Distribution

The negative binomial distribution adds an extra parameter to allow for overdispersion:

\[Var(Y) = E(Y) + \frac{E(Y)^2}{\phi}\]

where φ (phi) is the “reciprocal dispersion” parameter.

  • Lower φ = more overdispersion
  • As φ → ∞, the negative binomial approaches the Poisson

Flexibility

The negative binomial allows the variance to be larger than the mean, which better fits most real-world count data.

Simulating Negative Binomial Data

Fitting Negative Binomial Regression in R

Using rstanarm:

# First, let's create some overdispersed data
set.seed(853)
overdispersed_data <- tibble(
  x = runif(100, -2, 2),
  y = rnegbin(100, exp(1 + 0.5 * x), theta = 2)
)

fit_nb <- stan_glm(
  y ~ x,
  family = neg_binomial_2(link = "log"),
  data = overdispersed_data,
  seed = 853
)
print(fit_nb)
stan_glm
 family:       neg_binomial_2 [log]
 formula:      y ~ x
 observations: 100
 predictors:   2
------
            Median MAD_SD
(Intercept) 0.9    0.1   
x           0.5    0.1   

Auxiliary parameter(s):
                      Median MAD_SD
reciprocal_dispersion 2.7    0.8   

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

Comparing Poisson and Negative Binomial

# Fit Poisson to the same overdispersed data
fit_pois_compare <- stan_glm(
  y ~ x,
  family = poisson(link = "log"),
  data = overdispersed_data,
  seed = 853
)
Term Poisson Negative Binomial
(Intercept) 0.94 (0.07) 0.94 (0.09)
x 0.46 (0.06) 0.46 (0.08)

Example: Cause of Death in Alberta, Canada

Real-world example from the textbook: examining leading causes of death in Alberta.

cause Min Mean Max SD Var
Heart disease 680 1046.1905 1427 201.5963 40641.06
Dementia 897 1791.0952 2673 415.0024 172226.99
Lung cancer 1120 1510.5238 2057 242.0852 58605.26
Chronic lower respiratory 561 917.5238 1120 153.1723 23461.76
Stroke 457 603.0000 855 117.9178 13904.60

When to Use Which Model?

Use Poisson when:

  • Mean ≈ Variance
  • Simple count data
  • No strong clustering
  • Quick preliminary analysis

Use Negative Binomial when:

  • Variance > Mean
  • Unexplained heterogeneity
  • Real-world count data
  • You need accurate standard errors

Rule of Thumb

When in doubt, start with negative binomial. It’s safer and reduces to Poisson when overdispersion is minimal.

Exposure and Offset Terms

What is Exposure?

Sometimes counts occur over different exposure periods or populations.

Examples:

  • Number of accidents per intersection (but some intersections have more traffic)
  • Number of deaths per region (but regions have different populations)
  • Number of events per time period (but observation periods differ)

Modelling Rates with Offset

To model rates rather than raw counts, we use an offset:

\[y_i \sim \text{Poisson}(u_i \cdot \theta_i)\]

where \(u_i\) is the exposure and \(\theta_i = e^{X_i\beta}\) is the rate.

Taking logs:

\[\log(\lambda_i) = \log(u_i) + X_i\beta\]

The log of the exposure, \(\log(u_i)\), is called the offset.

Example: Roach Infestation Study

From ROS Chapter 15: A study of pest management in apartments.

  • Outcome: Number of roaches trapped (\(y_i\))
  • Exposure: Number of trap-days (\(u_i\))
  • Predictors: Pre-treatment roach level, treatment indicator, building type
fit_roaches <- stan_glm(
  y ~ roach100 + treatment + senior,
  family = neg_binomial_2(link = "log"),
  offset = log(exposure),
  data = roaches
)

Introduction to Multilevel Modelling

The Problem with Complete Pooling

So far, we’ve treated all observations as independent (complete pooling).

But data often has hierarchical structure:

  • Students within schools
  • Patients within hospitals
  • Voters within electorates
  • Time points within individuals

Why Does This Matter?

Ignoring grouping structure can lead to:

  • Underestimated standard errors
  • Incorrect inferences
  • Missed important patterns

Three Approaches

Complete Pooling

Treat all observations as one group.

Problem: Ignores group differences.

No Pooling

Fit separate models for each group.

Problem: Ignores information shared across groups.

Partial Pooling

Allow groups to differ while “borrowing strength” from other groups.

Multilevel modelling!

What is Multilevel Modelling?

Multilevel models (also called hierarchical models, random effects models) allow:

  • Intercepts to vary by group (varying intercepts)
  • Slopes to vary by group (varying slopes)
  • Groups to “borrow strength” from each other

Basic model with varying intercepts:

\[y_{ij} = \beta_0 + \alpha_j + \beta_1 x_{ij} + \epsilon_{ij}\]

where \(\alpha_j \sim N(0, \sigma^2_\alpha)\) is the random effect for group \(j\).

Example: Political Support by State

set.seed(853)
political_support <- tibble(
  state = sample(1:10, size = 500, replace = TRUE),
  gender = sample(c(0, 1), size = 500, replace = TRUE),
  noise = rnorm(n = 500, mean = 0, sd = 10),
  supports = as.numeric((state * 2 + gender * 5 + noise) > 20)
)
voter_model <- stan_glmer(
  supports ~ gender + (1 | state),
  data = political_support,
  family = binomial(link = "logit"),
  seed = 853
)

The (1 | state) term specifies varying intercepts by state.

Understanding the Output

print(voter_model)
stan_glmer
 family:       binomial [logit]
 formula:      supports ~ gender + (1 | state)
 observations: 500
------
            Median MAD_SD
(Intercept) -1.4    0.3  
gender       0.8    0.2  

Error terms:
 Groups Name        Std.Dev.
 state  (Intercept) 1       
Num. levels: state 10 

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

Visualising Random Effects

When to Use Multilevel Models

Consider multilevel modelling when:

  • Data has natural grouping structure
  • You want to make inferences about groups
  • Sample sizes vary across groups
  • Groups are a sample from a larger population

Practical Advice

If you have groups with at least 5-10 observations each and want to account for group-level variation, consider multilevel modelling.

Summary: Generalised Linear Models

The GLM Framework

All the models we’ve covered fit within the Generalised Linear Model framework:

  1. Outcome distribution (data distribution)
  2. Linear predictor (\(X\beta\))
  3. Link function (connects linear predictor to expected value)

Choosing the Right Model

Outcome Type Model R Function
Continuous Linear regression lm() or stan_glm()
Binary (0/1) Logistic regression glm(family=binomial) or stan_glm()
Count (no upper limit) Poisson regression glm(family=poisson) or stan_glm()
Count (overdispersed) Negative binomial MASS::glm.nb() or stan_glm(family=neg_binomial_2)
Count (with exposure) Poisson/NB with offset Add offset=log(exposure)

Key Takeaways

  1. Count data requires special models (Poisson or negative binomial)

  2. Overdispersion is common - the negative binomial handles it well

  3. Offset terms allow modelling rates when exposure varies

  4. Coefficients are multiplicative (exponentiate to interpret)

  5. Multilevel models account for hierarchical data structure

Remember

Always check model assumptions and use posterior predictive checks to validate your model choice!

R Functions Summary

# Poisson regression
glm(y ~ x, family = poisson(link = "log"), data = df)
stan_glm(y ~ x, family = poisson(link = "log"), data = df)

# Negative binomial regression
MASS::glm.nb(y ~ x, data = df)
stan_glm(y ~ x, family = neg_binomial_2(link = "log"), data = df)

# With offset for rates
stan_glm(y ~ x, family = poisson, offset = log(exposure), data = df)

# Multilevel model with varying intercepts
stan_glmer(y ~ x + (1 | group), family = poisson, data = df)

Next Week

Week 11: Surveys and Experimental Design

  • Principles of randomised controlled trials
  • Designing and analysing survey data
  • Survey weights and poststratification
  • Ethical foundations of experimental research

Readings:

  • TSwD Ch 8: Hunt data (8.2-8.4)
  • ROS Ch 16-17: Design and poststratification

References